Skip to content

Conversation

@tingyu215
Copy link

Add TS-LLaVA, a training-free baseline for video LLM.

@xjtupanda
Copy link
Collaborator

Thanks for sharing! We've incorporated your work.
Please also consider citing:

@article{yin2024survey,
  title={A survey on multimodal large language models},
  author={Yin, Shukang and Fu, Chaoyou and Zhao, Sirui and Li, Ke and Sun, Xing and Xu, Tong and Chen, Enhong},
  journal={National Science Review},
  pages={nwae403},
  year={2024},
  publisher={Oxford University Press}
}

@article{yin2024t2vid,
  title={T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs},
  author={Yin, Shukang and Fu, Chaoyou and Zhao, Sirui and Shen, Yunhang and Ge, Chunjiang and Yang, Yan and Long, Zuwei and Dai, Yuhan and Xu, Tong and Sun, Xing and others},
  journal={arXiv preprint arXiv:2411.19951},
  year={2024}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants